Skip to content

Replace smallest-first eviction with LRU in FileBasedCacheManager#152

Merged
JacksonKaunismaa merged 3 commits intomainfrom
fix/cache-manager-self-eviction
Feb 16, 2026
Merged

Replace smallest-first eviction with LRU in FileBasedCacheManager#152
JacksonKaunismaa merged 3 commits intomainfrom
fix/cache-manager-self-eviction

Conversation

@JacksonKaunismaa
Copy link
Contributor

@JacksonKaunismaa JacksonKaunismaa commented Feb 15, 2026

Summary

  • Replaces the smallest-first in-memory eviction strategy with LRU (OrderedDict), fixing a KeyError crash and reducing cache thrashing
  • Adds touch() method called on cache hits in maybe_load_cache to maintain correct LRU ordering
  • Simplifies the eviction layer: 3 methods (add_entry, _evict_lru, touch) replace 3 (add_entry, remove_entry, free_space_for)

Problem

add_entry didn't handle re-adding a bin that was already in memory (which happens when save_cache grows a bin on disk and maybe_load_cache reloads it). Three bugs resulted:

  1. Self-eviction crash: The old (smaller) tracked size made the bin the prime eviction candidate. The eviction loop popped it from in_memory_cache while add_entry was still running, then re-added it to sizes — leaving the two dicts out of sync. Next eviction cycle: KeyError at line 150.
  2. Size double-counting: Re-adding without subtracting the old size inflated total_usage_mb, triggering premature eviction.
  3. Oversized entry leak: Entries exceeding the cache limit stayed in in_memory_cache but were invisible to size tracking.

Beyond the crash, smallest-first eviction is a poor policy for this workload: small bins (often the current working set) get evicted while large stale bins persist, causing thrashing under memory pressure.

Fix

The first commit is a minimal 6-line fix that prevents the crash within the existing design. The second commit replaces the eviction layer entirely with LRU, which also fixes the crash but additionally eliminates the thrashing behavior.

Test plan

  • 7 unit tests in tests/test_cache_manager.py: 4 regression tests for the crash/double-counting/leak bugs, 3 tests for LRU ordering and touch() behavior
  • Verified all 7 tests pass with the LRU rewrite
  • Verified 3 of 4 regression tests fail on the original code (confirming they catch real bugs)
  • End-to-end: old code crashes with KeyError at max_mem_usage_mb=5 on 700+ pair variants; LRU rewrite completes all 12 variants

add_entry did not handle re-adding an existing bin (which happens when
save_cache writes to disk and a later maybe_load_cache reloads the grown
bin). Three bugs resulted:

1. Self-eviction: the old (smaller) size in self.sizes made the bin a
   prime candidate for smallest-first eviction during free_space_for,
   removing it from in_memory_cache while add_entry was still running.
   The subsequent re-addition to self.sizes left the two dicts out of
   sync, causing a KeyError on the next eviction cycle.

2. Double-counting: when the bin was not the smallest and survived
   eviction, self.sizes was overwritten with the new size but
   total_usage_mb had the new size added without subtracting the old,
   inflating the counter and triggering premature eviction.

3. Memory leak on oversized entries: when free_space_for returned False,
   the entry remained in in_memory_cache but was never tracked in sizes,
   leaking memory invisibly.

Fix: remove old size tracking before the eviction check, and clean up
in_memory_cache when the entry cannot fit.
The previous memory management layer had three bugs caused by add_entry
not handling re-added bins (when save_cache grows a bin on disk and
maybe_load_cache reloads it):

1. Self-eviction: the eviction loop could pick the entry being added
   (its old small size made it the smallest), removing it from
   in_memory_cache while add_entry continued to re-add it to sizes.
   Next eviction cycle: KeyError.

2. Double-counting: re-adding without subtracting the old size inflated
   total_usage_mb, triggering premature eviction.

3. Oversized entry leak: entries that exceeded the cache limit stayed
   in in_memory_cache but were invisible to the size tracker.

This commit replaces the entire eviction layer:

- OrderedDict for LRU ordering instead of smallest-first via min()
- Remove old tracking before eviction check (prevents self-eviction)
- touch() method for cache hits in maybe_load_cache
- Simpler code: 3 methods (add_entry, _evict_lru, touch) replace 3
  (add_entry, remove_entry, free_space_for)

Verified end-to-end: old code crashes with KeyError at max_mem_usage_mb=5
on 700+ pair variants; new code completes all 12 variants.
@JacksonKaunismaa JacksonKaunismaa changed the title Fix self-eviction crash in FileBasedCacheManager.add_entry Replace smallest-first eviction with LRU in FileBasedCacheManager Feb 15, 2026
@JacksonKaunismaa JacksonKaunismaa merged commit 024ebf1 into main Feb 16, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant